Gaussian (Normal) Distribution

What is the Gaussian Distribution?

The Gaussian distribution, also known as the normal distribution, is a continuous probability distribution that is symmetric about its mean. It describes how values of a variable are distributed in many natural phenomena such as heights, test scores, and measurement errors.

The probability density function (PDF) of the normal distribution is given by:

$$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$

Gaussian vs Normal Distribution

There is no difference between a Gaussian distribution and a normal distribution. Both terms refer to the same concept. The term "Gaussian" comes from the mathematician Carl Friedrich Gauss, who studied this distribution in depth. "Normal" is a more general term used in statistics to describe its common appearance in natural data.

In summary, they are two names for the same bell-shaped distribution:

Choosing the Number of Intervals (Bins)

To choose how many intervals (bins) to use and their width when grouping continuous data from a normal distribution, several statistical rules are commonly used.


✅ 1. Sturges’ Rule

Useful for approximately normal distributions and medium-sized datasets.

Number of intervals:

\[ k = 1 + \log_2(n) \]

Width of each interval:

\[ h = \frac{\max(x) - \min(x)}{k} \]


✅ 2. Square-root Rule

Number of intervals:

\[ k = \sqrt{n} \]

Width:

\[ h = \frac{\max(x) - \min(x)}{k} \]


✅ 3. Scott’s Rule

Scott’s rule minimizes estimation error for normal distributions.

Interval width:

\[ h = \frac{3.5 \,\sigma}{n^{1/3}} \]

Number of intervals:

\[ k = \frac{\max(x)-\min(x)}{h} \]


✅ 4. Freedman–Diaconis Rule

Uses IQR instead of standard deviation, making it robust to outliers.

Interval width:

\[ h = \frac{2 \cdot IQR}{n^{1/3}} \]

Number of intervals:

\[ k = \frac{\max(x) - \min(x)}{h} \]


🔍 Which Rule Should You Use?

Your Situation Best Method
Data is normal + medium/large sample Scott
Data is normal + small/medium sample Sturges
Simple/quick binning Square root
Outliers or heavy tails Freedman–Diaconis

📌 Practical Example (Python)


        import numpy as np
        
        x = np.array(values)
        n = len(x)
        sigma = np.std(x, ddof=1)
        
        h = 3.5 * sigma / (n ** (1/3))
        k = int(np.ceil((x.max() - x.min()) / h))
        
        print("Number of bins =", k)
        print("Width =", h)
                

📚 Summary

Number of intervals:

\[ k = \begin{cases} 1 + \log_2(n) & \text{Sturges} \\ \sqrt{n} & \text{Square root} \\ \frac{\max(x)-\min(x)}{3.5\sigma n^{-1/3}} & \text{Scott} \\ \frac{\max(x)-\min(x)}{2\,IQR\,n^{-1/3}} & \text{F-D} \end{cases} \]

Interval width:

\[ h = \frac{\max(x)-\min(x)}{k} \]

If you want, I can calculate the bins for your dataset or generate a helper function.